Overview

Dataset statistics

Number of variables16
Number of observations114
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)5.3%
Total size in memory14.4 KiB
Average record size in memory129.1 B

Variable types

Numeric16

Warnings

Dataset has 6 (5.3%) duplicate rows Duplicates

Reproduction

Analysis started2021-05-03 01:34:54.616693
Analysis finished2021-05-03 01:35:42.398531
Duration47.78 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

SI_1
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.736842105
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.918944361
Coefficient of variation (CV)0.4051104763
Kurtosis-0.681637431
Mean4.736842105
Median Absolute Deviation (MAD)1
Skewness-0.5422972309
Sum540
Variance3.682347462
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
728
24.6%
422
19.3%
521
18.4%
617
14.9%
112
10.5%
39
 
7.9%
25
 
4.4%
ValueCountFrequency (%)
112
10.5%
25
 
4.4%
39
 
7.9%
422
19.3%
521
18.4%
617
14.9%
728
24.6%
ValueCountFrequency (%)
728
24.6%
617
14.9%
521
18.4%
422
19.3%
39
 
7.9%
25
 
4.4%
112
10.5%

SI_2
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.026315789
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.700913478
Coefficient of variation (CV)0.3384016344
Kurtosis-0.7650009428
Mean5.026315789
Median Absolute Deviation (MAD)1.5
Skewness-0.4151710338
Sum573
Variance2.89310666
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
734
29.8%
524
21.1%
420
17.5%
314
12.3%
613
 
11.4%
26
 
5.3%
13
 
2.6%
ValueCountFrequency (%)
13
 
2.6%
26
 
5.3%
314
12.3%
420
17.5%
524
21.1%
613
 
11.4%
734
29.8%
ValueCountFrequency (%)
734
29.8%
613
 
11.4%
524
21.1%
420
17.5%
314
12.3%
26
 
5.3%
13
 
2.6%

SI_3
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.078947368
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median5
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.829670587
Coefficient of variation (CV)0.3602460223
Kurtosis-0.3160360933
Mean5.078947368
Median Absolute Deviation (MAD)1.5
Skewness-0.7450725156
Sum579
Variance3.347694457
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
735
30.7%
620
17.5%
519
16.7%
418
15.8%
312
 
10.5%
19
 
7.9%
21
 
0.9%
ValueCountFrequency (%)
19
 
7.9%
21
 
0.9%
312
 
10.5%
418
15.8%
519
16.7%
620
17.5%
735
30.7%
ValueCountFrequency (%)
735
30.7%
620
17.5%
519
16.7%
418
15.8%
312
 
10.5%
21
 
0.9%
19
 
7.9%

PEB_1
Real number (ℝ≥0)

Distinct6
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.201754386
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile4
Q16
median7
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.145826012
Coefficient of variation (CV)0.1847583668
Kurtosis3.428308691
Mean6.201754386
Median Absolute Deviation (MAD)0
Skewness-1.734802658
Sum707
Variance1.312917249
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
763
55.3%
628
24.6%
511
 
9.6%
49
 
7.9%
32
 
1.8%
11
 
0.9%
ValueCountFrequency (%)
11
 
0.9%
32
 
1.8%
49
 
7.9%
511
 
9.6%
628
24.6%
763
55.3%
ValueCountFrequency (%)
763
55.3%
628
24.6%
511
 
9.6%
49
 
7.9%
32
 
1.8%
11
 
0.9%

PEB_2
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.51754386
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile4
Q15
median6
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.371211312
Coefficient of variation (CV)0.2485184254
Kurtosis0.7958243268
Mean5.51754386
Median Absolute Deviation (MAD)1
Skewness-0.8967931752
Sum629
Variance1.880220463
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
734
29.8%
630
26.3%
423
20.2%
522
19.3%
22
 
1.8%
12
 
1.8%
31
 
0.9%
ValueCountFrequency (%)
12
 
1.8%
22
 
1.8%
31
 
0.9%
423
20.2%
522
19.3%
630
26.3%
734
29.8%
ValueCountFrequency (%)
734
29.8%
630
26.3%
522
19.3%
423
20.2%
31
 
0.9%
22
 
1.8%
12
 
1.8%

PEB_3
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.526315789
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13.25
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2.75

Descriptive statistics

Standard deviation1.552676971
Coefficient of variation (CV)0.3430332842
Kurtosis-0.6040647793
Mean4.526315789
Median Absolute Deviation (MAD)1
Skewness-0.338720305
Sum516
Variance2.410805776
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
532
28.1%
621
18.4%
421
18.4%
314
12.3%
212
 
10.5%
711
 
9.6%
13
 
2.6%
ValueCountFrequency (%)
13
 
2.6%
212
 
10.5%
314
12.3%
421
18.4%
532
28.1%
621
18.4%
711
 
9.6%
ValueCountFrequency (%)
711
 
9.6%
621
18.4%
532
28.1%
421
18.4%
314
12.3%
212
 
10.5%
13
 
2.6%

PEB_4
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.833333333
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile3
Q15
median6
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.356183261
Coefficient of variation (CV)0.2324885591
Kurtosis1.976087233
Mean5.833333333
Median Absolute Deviation (MAD)1
Skewness-1.424824613
Sum665
Variance1.839233038
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
745
39.5%
636
31.6%
515
 
13.2%
410
 
8.8%
35
 
4.4%
12
 
1.8%
21
 
0.9%
ValueCountFrequency (%)
12
 
1.8%
21
 
0.9%
35
 
4.4%
410
 
8.8%
515
 
13.2%
636
31.6%
745
39.5%
ValueCountFrequency (%)
745
39.5%
636
31.6%
515
 
13.2%
410
 
8.8%
35
 
4.4%
21
 
0.9%
12
 
1.8%

PEB_5
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.035087719
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.650582223
Coefficient of variation (CV)0.3278159815
Kurtosis-0.6440010142
Mean5.035087719
Median Absolute Deviation (MAD)1
Skewness-0.5256289397
Sum574
Variance2.724421674
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
727
23.7%
626
22.8%
520
17.5%
417
14.9%
316
14.0%
25
 
4.4%
13
 
2.6%
ValueCountFrequency (%)
13
 
2.6%
25
 
4.4%
316
14.0%
417
14.9%
520
17.5%
626
22.8%
727
23.7%
ValueCountFrequency (%)
727
23.7%
626
22.8%
520
17.5%
417
14.9%
316
14.0%
25
 
4.4%
13
 
2.6%

SoIn_1
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.728070175
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.974550863
Coefficient of variation (CV)0.4176230025
Kurtosis-0.9511002935
Mean4.728070175
Median Absolute Deviation (MAD)2
Skewness-0.4765546009
Sum539
Variance3.89885111
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
730
26.3%
520
17.5%
617
14.9%
417
14.9%
211
 
9.6%
110
 
8.8%
39
 
7.9%
ValueCountFrequency (%)
110
 
8.8%
211
 
9.6%
39
 
7.9%
417
14.9%
520
17.5%
617
14.9%
730
26.3%
ValueCountFrequency (%)
730
26.3%
617
14.9%
520
17.5%
417
14.9%
39
 
7.9%
211
 
9.6%
110
 
8.8%

SoIn_2
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.096491228
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.601820304
Coefficient of variation (CV)0.3142986484
Kurtosis0.1461860929
Mean5.096491228
Median Absolute Deviation (MAD)1
Skewness-0.7386490647
Sum581
Variance2.565828288
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
727
23.7%
526
22.8%
424
21.1%
623
20.2%
35
 
4.4%
15
 
4.4%
24
 
3.5%
ValueCountFrequency (%)
15
 
4.4%
24
 
3.5%
35
 
4.4%
424
21.1%
526
22.8%
623
20.2%
727
23.7%
ValueCountFrequency (%)
727
23.7%
623
20.2%
526
22.8%
424
21.1%
35
 
4.4%
24
 
3.5%
15
 
4.4%

SoIn_3
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.070175439
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.06838528
Coefficient of variation (CV)0.5081808663
Kurtosis-1.160653618
Mean4.070175439
Median Absolute Deviation (MAD)2
Skewness-0.07717464013
Sum464
Variance4.278217668
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
429
25.4%
721
18.4%
121
18.4%
612
10.5%
512
10.5%
310
 
8.8%
29
 
7.9%
ValueCountFrequency (%)
121
18.4%
29
 
7.9%
310
 
8.8%
429
25.4%
512
10.5%
612
10.5%
721
18.4%
ValueCountFrequency (%)
721
18.4%
612
10.5%
512
10.5%
429
25.4%
310
 
8.8%
29
 
7.9%
121
18.4%

SoIn_4
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.51754386
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.924377643
Coefficient of variation (CV)0.4259787404
Kurtosis-0.9485080188
Mean4.51754386
Median Absolute Deviation (MAD)2
Skewness-0.3040169889
Sum515
Variance3.703229312
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
725
21.9%
422
19.3%
520
17.5%
614
12.3%
313
11.4%
111
9.6%
29
 
7.9%
ValueCountFrequency (%)
111
9.6%
29
 
7.9%
313
11.4%
422
19.3%
520
17.5%
614
12.3%
725
21.9%
ValueCountFrequency (%)
725
21.9%
614
12.3%
520
17.5%
422
19.3%
313
11.4%
29
 
7.9%
111
9.6%

SoIn_5
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.728070175
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13.25
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2.75

Descriptive statistics

Standard deviation1.830349296
Coefficient of variation (CV)0.3871239697
Kurtosis-0.7527146363
Mean4.728070175
Median Absolute Deviation (MAD)1
Skewness-0.4346237233
Sum539
Variance3.350178544
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
726
22.8%
523
20.2%
419
16.7%
617
14.9%
314
12.3%
18
 
7.0%
27
 
6.1%
ValueCountFrequency (%)
18
 
7.0%
27
 
6.1%
314
12.3%
419
16.7%
523
20.2%
617
14.9%
726
22.8%
ValueCountFrequency (%)
726
22.8%
617
14.9%
523
20.2%
419
16.7%
314
12.3%
27
 
6.1%
18
 
7.0%

CA_1
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.01754386
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile3.65
Q15
median6.5
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.316891326
Coefficient of variation (CV)0.2188419987
Kurtosis3.016947664
Mean6.01754386
Median Absolute Deviation (MAD)0.5
Skewness-1.665691721
Sum686
Variance1.734202764
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
757
50.0%
626
22.8%
518
 
15.8%
47
 
6.1%
33
 
2.6%
12
 
1.8%
21
 
0.9%
ValueCountFrequency (%)
12
 
1.8%
21
 
0.9%
33
 
2.6%
47
 
6.1%
518
 
15.8%
626
22.8%
757
50.0%
ValueCountFrequency (%)
757
50.0%
626
22.8%
518
 
15.8%
47
 
6.1%
33
 
2.6%
21
 
0.9%
12
 
1.8%

CA_2
Real number (ℝ≥0)

Distinct7
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.640350877
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile3
Q15
median6
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.494101636
Coefficient of variation (CV)0.2648951579
Kurtosis0.9988147182
Mean5.640350877
Median Absolute Deviation (MAD)1
Skewness-1.176206542
Sum643
Variance2.232339699
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
743
37.7%
628
24.6%
521
18.4%
410
 
8.8%
38
 
7.0%
13
 
2.6%
21
 
0.9%
ValueCountFrequency (%)
13
 
2.6%
21
 
0.9%
38
 
7.0%
410
 
8.8%
521
18.4%
628
24.6%
743
37.7%
ValueCountFrequency (%)
743
37.7%
628
24.6%
521
18.4%
410
 
8.8%
38
 
7.0%
21
 
0.9%
13
 
2.6%

CA_3
Real number (ℝ≥0)

Distinct6
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.00877193
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size1.0 KiB

Quantile statistics

Minimum1
5-th percentile3.65
Q16
median6
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.251516701
Coefficient of variation (CV)0.2082816116
Kurtosis3.57599602
Mean6.00877193
Median Absolute Deviation (MAD)1
Skewness-1.726198675
Sum685
Variance1.566294054
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
751
44.7%
635
30.7%
516
 
14.0%
46
 
5.3%
34
 
3.5%
12
 
1.8%
ValueCountFrequency (%)
12
 
1.8%
34
 
3.5%
46
 
5.3%
516
 
14.0%
635
30.7%
751
44.7%
ValueCountFrequency (%)
751
44.7%
635
30.7%
516
 
14.0%
46
 
5.3%
34
 
3.5%
12
 
1.8%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

SI_1SI_2SI_3PEB_1PEB_2PEB_3PEB_4PEB_5SoIn_1SoIn_2SoIn_3SoIn_4SoIn_5CA_1CA_2CA_3
04457756746445777
17777777777777777
23546436544244546
31224444322224725
43777424611154475
55657657465366555
65567755621455777
77477747777777777
83136657675433666
91517127511111313

Last rows

SI_1SI_2SI_3PEB_1PEB_2PEB_3PEB_4PEB_5SoIn_1SoIn_2SoIn_3SoIn_4SoIn_5CA_1CA_2CA_3
1044767757757477777
1054734556264355564
1065244537344422666
1073333333333333333
1085566554565465755
1091737767415444766
1107577667474566757
1111514644217114777
1124646666455566566
1132255446646454666

Duplicate rows

Most frequent

SI_1SI_2SI_3PEB_1PEB_2PEB_3PEB_4PEB_5SoIn_1SoIn_2SoIn_3SoIn_4SoIn_5CA_1CA_2CA_3count
377777777777777774
045672266241217572
155365463554565562
277774477777777342